Take_home_Ex1

Modified

December 2, 2023

Overview

The recent shift in payment being made more digital, companies and organisations can more easily collect data and information that are linked to consumer habits. The transportation industry including public transport such as buses has also lean into this phenomenon. The information collected include travelling patterns that can help companies plan for more efficient routes or where heavy ridership is to be expected.

Objectives

Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

Task

Here we will utilise bus travelling data at different time duration for plotting out geospatial data and analysing them using various statistical tools.

Geovisualisation and Analysis

Computing passenger trips at the hexagonal level for the following time intervals:

  • Weekday morning peak, 6am to 9am
  • Weekday afternoon peak, 5pm to 8pm
  • Weekend/holiday morning peak, 11am to 2pm
  • Weekend/holiday evening peak, 4pm to 7pm

Display the geographical distribution using choropleth maps of the hexagons.

Combine all of the passenger trips made by all of the bus stops within a hexagon together

Local Indicators of Spatial Association(LISA) Analysis

Utilise Queen’s contiguity for performing LISA of the passenger trips by origin at hexagonal level Displat the LISA maps of the passenger trips at hexagonal level.

Load Packages and Data

Load packages

Here we will load the packages needed for this exercise and their respective functions - sf: - tmap: - spdep: - tidyverse: - dplyr: - mapview: - sfdep:

pacman::p_load(sf,tmap,spdep,tidyverse, dplyr, mapview, sfdep, stplanr)

Loading data

Loading aspatial table

Here we will read all of the ridership from different bus stops in Oct 2023 and assign it to the variable.

odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")

We will then extract the information from the following and assign them to different variables.

Day Duration Variable name
Weekdays 6am - 9am weekdayAM_6_9
Weekdays 5pm - 8pm weekdayPM_5_8
Weekends/Holidays 11am - 2pm weekendAM_11_14
Weekends/Holidays 4pm - 7pm weekendPM_4_7
weekdayAM_6_9 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

weekdayPM_5_8 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

weekendAM_11_14 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

weekendPM_16_19 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

Loading Geospatial data

Next we will import all of the bus stops and their coordinates and attached it to the busstop variable.

busstop <- st_read(dsn = "data/geospatial", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\Users\Lian Khye\Desktop\MITB\Geospatial\geartooth\ISSS624\Take_home_Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

We will first rename the bus stop column title for easier data joining.

colnames(busstop)[colnames(busstop) == "BUS_STOP_N"] <- "ORIGIN_PT_CODE"

After that we will create the hexagons that will create the map layout. The hexagons will be shaped 250 x 250 cell size. All of the hexagons will also be given a grid id name that can be used for identifying each individual grid.

center <- st_centroid(busstop)

area_honeycomb_grid <- st_make_grid(center, cellsize = c(250 * sqrt(3), 250 * 2), what = "polygons", square = FALSE)
honeycomb_grid_sf <- st_sf(area_honeycomb_grid) %>%
  mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))

Data processing

Assigning individual bus stop to hexagons

First we will assign the bus stop point geometry data to each polygon using st_intersection(). The function assigns all of the points to a polygon by the point-set intersection of two geometries. Additional information here.

busstop_hex <- st_intersection(busstop, honeycomb_grid_sf) %>%
  st_drop_geometry()

Duplication check

Here we will check for the presence of any duplication before we further process the data.

duplicate1 <- weekdayAM_6_9 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate2 <- weekdayPM_5_8 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate3 <- weekendAM_11_14 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate4 <- weekendPM_16_19 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

We can see which data points are duplicated here.

c(duplicate1,duplicate2,duplicate3,duplicate4)
$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

Finally we only keep data points that are unique using the unique() function.

unique_weekdayAM <- unique(weekdayAM_6_9)
unique_weekdayPM <- unique(weekdayPM_5_8)
unique_weekendAM <- unique(weekendAM_11_14)
unique_weekendPM <- unique(weekendPM_16_19)

Trip tabulation

Next we will then join the variables that we created earlier that contains the total number of trips at different time intervals and the busstop_hex variable together using grid_id column title that they have in common. The total number of trips made from each hexagon is then summed up together and placed under a new column named TOT_TRIPS.

count_weekdayAM_6_9 <- left_join(unique_weekdayAM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

count_weekdayPM_5_8 <- left_join(unique_weekdayPM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

count_weekendAM_11_14 <- left_join(unique_weekendAM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

count_weekendPM_16_19 <- left_join(unique_weekendPM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

Reassign polygon information

We will the reassign the polygon information from the hexagonal map that we have created earlier.

poly_weekdayAM_6_9 <- left_join(honeycomb_grid_sf,count_weekdayAM_6_9)
poly_weekdayPM_5_8 <- left_join(honeycomb_grid_sf,count_weekdayPM_5_8)
poly_weekendAM_11_14 <- left_join(honeycomb_grid_sf,count_weekendAM_11_14)
poly_weekendPM_16_19 <- left_join(honeycomb_grid_sf,count_weekendPM_16_19)

Filter for empty trips

Following that we will filter hexagons that have no trips to obtain only valid hexagons for mapping.

grid_weekdayAM <- poly_weekdayAM_6_9 %>%
  filter(TOT_TRIPS > 0)

grid_weekdayPM <- poly_weekdayPM_5_8 %>%
  filter(TOT_TRIPS > 0)

grid_weekendAM <- poly_weekendAM_11_14 %>%
  filter(TOT_TRIPS > 0)

grid_weekendPM <- poly_weekendPM_16_19 %>%
  filter(TOT_TRIPS > 0)

Choropleth map

Here we will plot the choropleth map for the different time intervals. We will be using tmap_mode(“plot”) to create an interactive map. Although we will be coding in accessories such as the compass, they will not be displayed in the interactive map. However by writing them first, we can display them in subsequent maps once we view them in plot mode.

tmap_mode("view")
mapA <- tm_shape(grid_weekdayAM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Blues",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekday 6am-9am",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapA

The total number of ridership range from 1 to 357043 per hexagon. The range of the data are divided to quantile range bands for clearer distinction between ridership of each hexagon.

mapB <- tm_shape(grid_weekdayPM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Reds",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekday 5pm-8pm",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapB

The total number of ridership range from 1 to 568845 per hexagon.

mapC <- tm_shape(grid_weekendAM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Greens",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekend/holidays 11am-2pm",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapC

The total number of ridership range from 1 to 117609 per hexagon.

mapD <- tm_shape(grid_weekendPM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Purples",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekend/holidays 4pm-7pm",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapD

The total number of ridership range from 1 to 114410 per hexagon.

Plot maps

We can also utilise a plot mode by using tmap_mode(“plot”).

tmap_mode("plot")
mapA

This allows for other accessories to be added such as compass and scales which might be useful depending on the application or use of the map.

Choropleth map discussion

We can see that the total ridership over the weekends is lower than the weekday counterparts despite both being classified as peak hours. This difference is likely due to people travelling to or from work on the weekdays as compared to the weekends where there will be less traffic as people choose to stay at home or electing to travel to districts where it may be more accessible by other transportations such as trains.

LISA

Creating neighbour count column

Before we calculate LISA, we will add the number of neighbours to each dataset for the various grids. We will use st_contiguity() and add a neighbour column to the data points under a new variable name. By default, the Queen method of contiguity will be used for calculating the neighbours.

wm_qA <- grid_weekdayAM %>%
    mutate(nb = st_contiguity(area_honeycomb_grid),
         .before = 1)

wm_qB <- grid_weekdayPM %>%
  mutate(nb = st_contiguity(area_honeycomb_grid),
         .before = 1)

wm_qC <- grid_weekendAM %>%
  mutate(nb = st_contiguity(area_honeycomb_grid),
         .before = 1)

wm_qD <- grid_weekendPM %>%
  mutate(nb = st_contiguity(area_honeycomb_grid),
         .before = 1)

Here we can use the summary() function to take a look at the different regions and the neighbours that they have.

summary(wm_qA$nb)
Neighbour list object:
Number of regions: 1779 
Number of nonzero links: 7672 
Percentage nonzero weights: 0.2424134 
Average number of links: 4.312535 
14 regions with no links:
1 311 358 489 526 651 765 837 1226 1753 1756 1760 1769 1779
Link number distribution:

  0   1   2   3   4   5   6 
 14  54 165 250 392 454 450 
54 least connected regions:
2 3 8 22 23 35 36 56 65 70 110 126 172 199 200 209 210 218 235 251 347 373 652 664 698 735 756 806 818 819 844 846 861 898 927 983 1168 1180 1181 1183 1211 1261 1263 1324 1434 1497 1506 1523 1542 1554 1629 1663 1761 1762 with 1 link
450 most connected regions:
29 48 71 78 81 82 87 89 94 95 112 119 124 125 131 135 141 144 148 178 185 186 187 194 195 196 204 205 206 216 217 224 230 241 247 249 256 258 259 265 269 271 272 276 277 278 283 284 285 316 320 331 337 339 340 341 344 349 350 353 354 355 356 362 365 366 369 381 382 383 385 386 397 398 401 408 410 411 427 436 440 457 465 471 474 481 486 487 496 502 503 504 510 514 515 516 522 528 530 531 539 546 558 559 568 572 579 580 583 584 589 590 601 612 613 615 621 624 625 626 631 632 635 636 641 646 647 654 655 659 666 667 670 671 673 674 683 684 687 690 691 700 701 705 710 711 712 717 724 725 730 731 737 742 743 748 752 761 762 766 771 780 781 786 792 797 798 801 809 810 813 821 822 826 827 835 840 849 855 866 867 870 872 882 883 889 900 901 903 907 910 912 916 917 932 936 937 938 952 953 957 958 959 965 970 976 977 978 979 980 981 984 985 991 997 1000 1001 1002 1005 1006 1012 1015 1016 1019 1020 1022 1023 1024 1025 1031 1032 1036 1037 1038 1039 1041 1042 1043 1048 1057 1058 1061 1062 1063 1066 1067 1071 1076 1080 1081 1088 1089 1090 1092 1096 1097 1098 1100 1103 1104 1105 1111 1114 1115 1118 1121 1131 1132 1134 1136 1145 1146 1147 1148 1159 1160 1161 1172 1173 1179 1185 1186 1187 1204 1206 1207 1209 1221 1222 1223 1229 1230 1231 1232 1233 1239 1240 1244 1245 1246 1251 1255 1257 1258 1271 1274 1282 1283 1284 1287 1289 1290 1296 1297 1298 1303 1305 1311 1316 1317 1318 1322 1329 1330 1335 1337 1338 1339 1343 1349 1350 1356 1361 1363 1366 1368 1369 1371 1375 1382 1383 1384 1385 1386 1387 1391 1394 1395 1396 1397 1398 1403 1405 1408 1409 1410 1411 1418 1419 1422 1423 1424 1425 1426 1427 1429 1431 1437 1438 1439 1447 1451 1452 1453 1461 1465 1470 1484 1487 1495 1498 1502 1508 1513 1516 1536 1537 1544 1545 1549 1551 1558 1559 1564 1565 1573 1574 1582 1583 1585 1586 1587 1591 1594 1595 1598 1599 1601 1602 1608 1611 1612 1617 1620 1621 1622 1623 1627 1635 1644 1651 1656 1658 1659 1667 1668 1670 1671 1674 1681 1683 1684 1692 1694 1696 1698 1700 1704 1705 1706 1708 1710 1711 1715 1718 1724 1725 1728 1729 1732 with 6 links

We see that there are 1779 data points and they have an average of 4.3 neighbours. There are 14 hexagons with 0 neighbours and 450 hexagons with 6 neighbours.

summary(wm_qB$nb)
Neighbour list object:
Number of regions: 1780 
Number of nonzero links: 7680 
Percentage nonzero weights: 0.2423936 
Average number of links: 4.314607 
14 regions with no links:
1 311 358 489 526 651 765 837 1226 1754 1757 1761 1770 1780
Link number distribution:

  0   1   2   3   4   5   6 
 14  53 164 251 394 454 450 
53 least connected regions:
2 3 8 22 23 35 36 56 65 70 110 126 172 199 200 201 218 219 235 251 347 373 652 664 698 735 756 806 818 819 844 846 861 898 927 983 1168 1180 1181 1183 1211 1261 1263 1324 1434 1497 1506 1543 1555 1630 1664 1762 1763 with 1 link
450 most connected regions:
29 48 71 78 81 82 87 89 94 95 112 119 124 125 131 135 141 144 148 178 185 186 187 194 195 196 205 206 207 216 217 224 230 241 247 249 256 258 259 265 269 271 272 276 277 278 283 284 285 316 320 331 337 339 340 341 344 349 350 353 354 355 356 362 365 366 369 381 382 383 385 386 397 398 401 408 410 411 427 436 440 457 465 471 474 481 486 487 496 502 503 504 510 514 515 516 522 528 530 531 539 546 558 559 568 572 579 580 583 584 589 590 601 612 613 615 621 624 625 626 631 632 635 636 641 646 647 654 655 659 666 667 670 671 673 674 683 684 687 690 691 700 701 705 710 711 712 717 724 725 730 731 737 742 743 748 752 761 762 766 771 780 781 786 792 797 798 801 809 810 813 821 822 826 827 835 840 849 855 866 867 870 872 882 883 889 900 901 903 907 910 912 916 917 932 936 937 938 952 953 957 958 959 965 970 976 977 978 979 980 981 984 985 991 997 1000 1001 1002 1005 1006 1012 1015 1016 1019 1020 1022 1023 1024 1025 1031 1032 1036 1037 1038 1039 1041 1042 1043 1048 1057 1058 1061 1062 1063 1066 1067 1071 1076 1080 1081 1088 1089 1090 1092 1096 1097 1098 1100 1103 1104 1105 1111 1114 1115 1118 1121 1131 1132 1134 1136 1145 1146 1147 1148 1159 1160 1161 1172 1173 1179 1185 1186 1187 1204 1206 1207 1209 1221 1222 1223 1229 1230 1231 1232 1233 1239 1240 1244 1245 1246 1251 1255 1257 1258 1271 1274 1282 1283 1284 1287 1289 1290 1296 1297 1298 1303 1305 1311 1316 1317 1318 1322 1329 1330 1335 1337 1338 1339 1343 1349 1350 1356 1361 1363 1366 1368 1369 1371 1375 1382 1383 1384 1385 1386 1387 1391 1394 1395 1396 1397 1398 1403 1405 1408 1409 1410 1411 1418 1419 1422 1423 1424 1425 1426 1427 1429 1431 1437 1438 1439 1447 1451 1452 1453 1461 1465 1470 1484 1487 1495 1498 1502 1508 1513 1516 1537 1538 1545 1546 1550 1552 1559 1560 1565 1566 1574 1575 1583 1584 1586 1587 1588 1592 1595 1596 1599 1600 1602 1603 1609 1612 1613 1618 1621 1622 1623 1624 1628 1636 1645 1652 1657 1659 1660 1668 1669 1671 1672 1675 1682 1684 1685 1693 1695 1697 1699 1701 1705 1706 1707 1709 1711 1712 1716 1719 1725 1726 1729 1730 1733 with 6 links

We see that there are 1780 data points and they have an average of 4.3 neighbours. There are 14 hexagons with 0 neighbours and 450 hexagons with 6 neighbours.

summary(wm_qC$nb)
Neighbour list object:
Number of regions: 1786 
Number of nonzero links: 7702 
Percentage nonzero weights: 0.2414574 
Average number of links: 4.31243 
13 regions with no links:
288 357 487 524 649 762 834 1224 1760 1763 1767 1776 1786
Link number distribution:

  0   1   2   3   4   5   6 
 13  56 164 253 395 451 454 
56 least connected regions:
1 2 7 21 22 34 35 55 64 69 109 125 150 170 196 197 205 206 214 230 246 310 346 372 650 661 695 732 753 803 815 816 841 843 858 895 924 980 1166 1178 1179 1209 1259 1261 1322 1434 1444 1495 1498 1507 1546 1559 1636 1670 1768 1769 with 1 link
454 most connected regions:
28 47 70 77 80 81 86 88 93 94 111 118 123 124 130 134 140 143 146 176 183 184 185 191 192 193 201 202 203 212 213 219 225 236 242 244 251 253 254 260 264 266 267 271 272 273 278 279 280 315 319 330 336 338 339 340 343 348 349 352 353 354 355 361 364 365 368 380 381 382 384 385 396 397 400 407 409 410 426 435 439 456 464 469 472 479 484 485 494 500 501 502 508 512 513 514 520 526 528 529 537 544 556 557 566 570 577 578 581 582 587 588 599 610 611 613 619 622 623 624 629 630 633 634 639 644 645 651 652 656 663 664 667 668 670 671 680 681 684 687 688 697 698 702 707 708 709 714 721 722 727 728 734 739 740 745 749 758 759 763 768 777 778 783 789 794 795 798 806 807 810 818 819 823 824 832 837 846 852 863 864 867 869 879 880 886 897 898 900 904 907 909 913 914 929 933 934 935 949 950 954 955 956 962 967 973 974 975 976 977 978 981 982 988 994 997 998 999 1002 1003 1009 1012 1013 1016 1017 1019 1020 1021 1022 1028 1029 1033 1034 1035 1036 1038 1039 1040 1045 1054 1055 1058 1059 1060 1063 1064 1068 1073 1077 1078 1085 1086 1087 1089 1093 1094 1095 1097 1100 1101 1102 1108 1111 1112 1115 1118 1128 1129 1131 1133 1142 1143 1144 1145 1157 1158 1159 1170 1171 1177 1183 1184 1185 1202 1204 1205 1207 1219 1220 1221 1227 1228 1229 1230 1231 1237 1238 1242 1243 1244 1249 1253 1255 1256 1269 1272 1280 1281 1282 1285 1287 1288 1294 1295 1296 1301 1303 1309 1314 1315 1316 1320 1327 1328 1333 1335 1336 1337 1341 1347 1348 1354 1359 1361 1364 1366 1367 1369 1373 1380 1381 1382 1383 1384 1385 1390 1393 1394 1395 1396 1397 1402 1404 1407 1408 1409 1410 1418 1419 1422 1423 1424 1425 1426 1427 1429 1431 1437 1438 1439 1448 1452 1453 1454 1462 1466 1471 1485 1488 1496 1499 1503 1509 1514 1517 1519 1533 1540 1541 1547 1548 1549 1554 1556 1563 1564 1568 1569 1570 1579 1580 1588 1589 1591 1592 1593 1598 1601 1602 1605 1606 1608 1609 1615 1618 1619 1624 1627 1628 1629 1630 1634 1642 1651 1658 1663 1665 1666 1674 1675 1677 1678 1681 1688 1690 1691 1699 1701 1703 1705 1707 1711 1712 1713 1715 1717 1718 1722 1725 1731 1732 1735 1736 1739 with 6 links

We see that there are 1786 data points and they have an average of 4.3 neighbours. There are 13 hexagons with 0 neighbours and 454 hexagons with 6 neighbours.

summary(wm_qD$nb)
Neighbour list object:
Number of regions: 1775 
Number of nonzero links: 7636 
Percentage nonzero weights: 0.2423646 
Average number of links: 4.301972 
12 regions with no links:
1 293 362 492 529 654 764 836 1223 1755 1758 1762
Link number distribution:

  0   1   2   3   4   5   6 
 12  55 166 259 389 448 446 
55 least connected regions:
2 3 8 22 23 35 36 56 65 70 110 126 172 198 199 208 209 218 235 251 315 351 377 655 665 699 735 755 805 817 818 843 845 860 897 926 952 979 1165 1177 1178 1208 1258 1260 1321 1433 1443 1498 1507 1545 1558 1635 1669 1763 1764 with 1 link
446 most connected regions:
29 48 71 78 81 82 87 89 94 95 112 119 124 125 131 135 141 144 148 178 185 186 187 193 194 195 204 205 206 216 217 224 230 241 247 249 256 258 259 265 269 271 272 276 277 278 283 284 285 320 324 335 341 343 344 345 348 353 354 357 358 359 360 366 369 370 373 385 386 387 389 390 401 402 405 412 414 415 431 440 444 461 469 474 477 484 489 490 499 505 506 507 513 517 518 519 525 531 533 534 542 549 561 562 571 575 582 583 586 587 592 593 604 615 616 618 624 627 628 629 634 635 638 644 656 657 667 668 672 674 675 684 685 688 691 692 701 702 706 711 712 713 717 724 725 730 731 737 741 742 747 751 760 761 765 770 779 780 785 791 796 797 800 808 809 812 820 821 825 826 834 839 848 854 865 866 869 871 881 882 888 899 900 902 906 909 911 915 916 931 934 935 936 950 951 954 955 956 962 967 974 975 976 977 980 981 987 996 997 998 1001 1002 1008 1011 1012 1015 1016 1018 1019 1020 1021 1027 1028 1032 1033 1034 1035 1037 1038 1039 1044 1053 1054 1057 1058 1059 1062 1063 1067 1072 1076 1077 1084 1085 1086 1088 1092 1093 1094 1096 1099 1100 1101 1107 1110 1111 1114 1117 1127 1128 1130 1132 1141 1142 1143 1144 1156 1157 1158 1169 1170 1176 1182 1183 1184 1201 1203 1204 1206 1218 1219 1220 1226 1227 1228 1229 1230 1236 1237 1241 1242 1243 1248 1252 1254 1255 1268 1271 1279 1280 1281 1284 1286 1287 1293 1294 1295 1300 1302 1308 1313 1314 1315 1319 1326 1327 1332 1334 1335 1336 1340 1346 1347 1353 1358 1360 1363 1365 1366 1368 1372 1379 1380 1381 1382 1383 1384 1389 1392 1393 1394 1395 1396 1401 1403 1406 1407 1408 1409 1417 1418 1421 1422 1423 1424 1425 1426 1428 1430 1436 1437 1438 1447 1451 1452 1453 1461 1465 1470 1485 1488 1496 1499 1503 1509 1514 1517 1519 1532 1539 1540 1546 1547 1548 1553 1555 1562 1563 1567 1568 1569 1578 1579 1587 1588 1590 1591 1592 1597 1600 1601 1604 1605 1607 1608 1614 1617 1618 1623 1626 1627 1628 1629 1633 1641 1650 1657 1662 1664 1665 1673 1674 1676 1677 1680 1687 1689 1690 1698 1700 1702 1704 1706 1710 1711 1712 1714 1716 1717 1721 1724 1730 1731 1734 1735 1738 with 6 links

We see that there are 1775 data points and they have an average of 4.3 neighbours. There are 12 hexagons with 0 neighbours and 446 hexagons with 6 neighbours.

Neighbour count

Here we can convert the above neighbour count data into a variable dataframe for more direct comparison

df_countsA <- wm_qA %>%
  summarise(vector_length = map_dbl(nb, ~length(.))) %>%
  group_by(vector_length) %>%
  summarise(count = n()) %>%
  st_drop_geometry()

df_countsB <- wm_qB %>%
  summarise(vector_length = map_dbl(nb, ~length(.))) %>%
  group_by(vector_length) %>%
  summarise(count = n())%>%
  st_drop_geometry()

df_countsC <- wm_qC %>%
  summarise(vector_length = map_dbl(nb, ~length(.))) %>%
  group_by(vector_length) %>%
  summarise(count = n())%>%
  st_drop_geometry()

df_countsD <- wm_qD %>%
  summarise(vector_length = map_dbl(nb, ~length(.))) %>%
  group_by(vector_length) %>%
  summarise(count = n())%>%
  st_drop_geometry()

result <- bind_rows(
  mutate(df_countsA, dataset = "Weekday 6am to 9am"),
  mutate(df_countsB, dataset = "Weekday 5pm to 8pm"),
  mutate(df_countsC, dataset = "Weekends/holidays 11am to 2pm"),
  mutate(df_countsD, dataset = "Weekends/holidays 4pm to 7pm")
) %>%
  rename("number of neighbours" = vector_length, count = count)

Compiling neighbour count

We will use the bind_rows() function to group all of the neighbour count data together and rearrange them.

result <- bind_rows(
  mutate(df_countsA, dataset = "Weekday 6am to 9am"),
  mutate(df_countsB, dataset = "Weekday 5pm to 8pm"),
  mutate(df_countsC, dataset = "Weekends/holidays 11am to 2pm"),
  mutate(df_countsD, dataset = "Weekends/holidays 4pm to 7pm")
) %>%
  rename("number_of_neighbours" = vector_length, count = count)




result <- result %>%
  group_by(dataset) %>%
  arrange(desc(number_of_neighbours)) %>%
  mutate(number_of_neighbours = factor(number_of_neighbours, levels = unique(number_of_neighbours)))

Plotting neighbour count bar chart

Next we will use ggplot to plot the bar chart for the number of hexagons with differing neighbour count and the different time interval.

ggplot(result, aes(x = number_of_neighbours, y = count, fill = dataset)) +
  stat_summary(fun = "sum", geom = "bar", position = "dodge") +
  scale_fill_manual(values = c(
                   "Weekday 6am to 9am" = "blue4",
                   "Weekday 5pm to 8pm" = "brown2",
                   "Weekends/holidays 11am to 2pm" = "palegreen",
                   "Weekends/holidays 4pm to 7pm" = "mediumpurple1")) 

  labs(title = "Number of Neighbours Counts",
       x = "Number of Neighbours",
       y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
NULL

Overall we see a decrease in ridership between weekdays and weekends however the number of bus stops that are utilised remains fairly the same. This may mean an overall decrease in ridership for all bus stops but the bus stops that are utilised for boarding remain the same.

Next we will have to remove all of the data points with 0 neighbours using filter and purrr package.

wm_qA_process <- wm_qA %>%
   filter(!purrr::map_lgl(nb, ~ all(. == 0)))
wm_qB_process <- wm_qB %>%
   filter(!purrr::map_lgl(nb, ~ all(. == 0)))
wm_qC_process <- wm_qC %>%
   filter(!purrr::map_lgl(nb, ~ all(. == 0)))
wm_qD_process <- wm_qD %>%
   filter(!purrr::map_lgl(nb, ~ all(. == 0)))

Spatial weights

wm_qA_weighted <- wm_qA_process %>% 
  mutate(nb = st_contiguity(area_honeycomb_grid),
         wt = st_weights(nb, style = "W"),
         .before = 1)
wm_qB_weighted <- wm_qB_process %>% 
  mutate(nb = st_contiguity(area_honeycomb_grid),
         wt = st_weights(nb, style = "W"),
         .before = 1)
wm_qC_weighted <- wm_qC_process %>% 
  mutate(nb = st_contiguity(area_honeycomb_grid),
         wt = st_weights(nb, style = "W"),
         .before = 1)
wm_qD_weighted <- wm_qD_process %>% 
  mutate(nb = st_contiguity(area_honeycomb_grid),
         wt = st_weights(nb, style = "W"),
         .before = 1)

Local Moran’s I

Here we will be using the local_moran() function to calculate the local Moran’s I for each region or county.

lisaA <- wm_qA_weighted %>% 
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_moran)
lisaB <- wm_qB_weighted %>% 
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_moran)
lisaC <- wm_qC_weighted %>% 
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_moran)
lisaD <- wm_qD_weighted %>% 
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt, nsim = 99),
         .before = 1) %>%
  unnest(local_moran)

The output will be a data fram containing the ii, eii, var_ii, z_ii, p_ii, p_ii_sim and p_folded_sum.

Combined visualisation of local Moran’s I and p-value

Here we will place the maps next to each other.

tmap_mode("plot")
map1A <- tm_shape(lisaA) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of the total number of trips",
            main.title.size = 0.8)

map2A <- tm_shape(lisaA) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)

tmap_arrange(map1A, map2A, ncol = 2)

tmap_mode("plot")
map1B <- tm_shape(lisaB) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of the total number of trips",
            main.title.size = 0.8)

map2B <- tm_shape(lisaB) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)

tmap_arrange(map1B, map2B, ncol = 2)

tmap_mode("plot")
map1C <- tm_shape(lisaC) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of the total number of trips",
            main.title.size = 0.8)

map2C <- tm_shape(lisaC) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)

tmap_arrange(map1C, map2C, ncol = 2)

tmap_mode("plot")
map1D <- tm_shape(lisaD) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of the total number of trips",
            main.title.size = 0.8)

map2D <- tm_shape(lisaD) +
  tm_fill("p_ii",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)

tmap_arrange(map1D, map2D, ncol = 2)

Visualisation of LISA

Here will will visualise LISA where we can see the presence of outliers and clusters. More information can be found here. The following is a newer method for calculating LISA, and require shorter and more concise steps such as not having to manually form the high-high, high-low, low-high and low-low quadrants. Just make sure that if the data is skewed, we will have to use the median for forming the quadrant.

lisa_sigA <- lisaA  %>%
  filter(p_ii < 0.05)
tmap_mode("view")
tm_shape(lisaA) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sigA) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)
lisa_sigB <- lisaB  %>%
  filter(p_ii < 0.05)
tmap_mode("view")
tm_shape(lisaB) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sigB) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)
lisa_sigC <- lisaC  %>%
  filter(p_ii < 0.05)
tmap_mode("view")
tm_shape(lisaC) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sigC) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)
lisa_sigD <- lisaD  %>%
  filter(p_ii < 0.05)
tmap_mode("view")
tm_shape(lisaD) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sigD) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)

Additional processing

Passenger flow visualisation

Origin and destination grouping

We can also take a look at passenger glow between the different hexes by plotting the desire lines. We will first need to obtain the total number of trips of each intervals that are grouped by their origin followed by their destination.

des_weekdayAM_6_9 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

des_weekdayPM_5_8 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

des_weekendAM_11_14 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

des_weekendPM_16_19 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

Joining destination table and grid

Since we have drop

Using the grid that we have created in the beginning, we will now join the data that contains the destination together with the grid.

combn_des_weekdayAM_6_9 <- left_join(des_weekdayAM_6_9,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

combn_des_weekdayPM_5_8 <- left_join(des_weekdayPM_5_8,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

combn_des_weekendAM_11_14 <- left_join(des_weekendAM_11_14,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

combn_des_weekendPM_16_19 <- left_join(des_weekendPM_16_19,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

Removing duplicates

We will then remove any duplicates using the unique() function.

uni_des_weekdayAM_6_9 <- unique(combn_des_weekdayAM_6_9)
uni_des_weekdayPM_5_8 <- unique(combn_des_weekdayPM_5_8)
uni_des_weekendAM_11_14 <- unique(combn_des_weekendAM_11_14)
uni_des_weekendPM_16_19 <- unique(combn_des_weekendPM_16_19)

Rejoin hexagonal information

We can then rejoin the hexagonal information back using left_join() function.

poly_des_weekdayAM_6_9 <- left_join(uni_des_weekdayAM_6_9 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekdayPM_5_8 <- left_join(uni_des_weekdayPM_5_8 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekendAM_11_14 <- left_join(uni_des_weekendAM_11_14 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekendPM_16_19 <- left_join(uni_des_weekendPM_16_19 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))

Recheck unique data

We will then recheck for unique data points by running unique() again.

uniP_des_weekdayAM_6_9 <- unique(poly_des_weekdayAM_6_9)
uniP_des_weekdayPM_5_8 <- unique(poly_des_weekdayPM_5_8)
uniP_des_weekendAM_11_14 <- unique(poly_des_weekendAM_11_14)
uniP_des_weekendPM_16_19 <- unique(poly_des_weekendPM_16_19)

Sum total trips

We will then sum up the total number of trips made from a bus stop of origin to the destination using group_by() for sorting. Putting multiple arguments into the function allows for the sub categorising the data. This way we can track the drop off point of the passengers.

ori_des_weekdayAM_6_9 <- uniP_des_weekdayAM_6_9 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

ori_des_weekdayPM_5_8 <- uniP_des_weekdayPM_5_8 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

ori_des_weekendAM_11_14 <- uniP_des_weekendAM_11_14 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

ori_des_weekendPM_16_19 <- uniP_des_weekendPM_16_19 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

Visualisation

We will first remove any intra-hexagonal travel.

R_weekdayAM_6_9 <- ori_des_weekdayAM_6_9[ori_des_weekdayAM_6_9$ORIGIN_GRID!=ori_des_weekdayAM_6_9$DESTIN_GRID,]
R_weekdayPM_5_8 <- ori_des_weekdayPM_5_8[ori_des_weekdayPM_5_8$ORIGIN_GRID!=ori_des_weekdayPM_5_8$DESTIN_GRID,]
R_weekendAM_11_14 <- ori_des_weekendAM_11_14[ori_des_weekendAM_11_14$ORIGIN_GRID!=ori_des_weekendAM_11_14$DESTIN_GRID,]
R_weekendPM_16_19 <- ori_des_weekendPM_16_19[ori_des_weekendPM_16_19$ORIGIN_GRID!=ori_des_weekendPM_16_19$DESTIN_GRID,]

Create desire lines

Next we will visualise all of the flow or connections between different bus stops from a subzone to another using the od2line() function from the stplanr package.

flow_weekdayAM_6_9 <- od2line(flow = R_weekdayAM_6_9, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

flow_weekdayPM_5_8 <- od2line(flow = R_weekdayPM_5_8, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

flow_weekendAM_11_14 <- od2line(flow = R_weekendAM_11_14, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

flow_weekendPM_16_19 <- od2line(flow = R_weekendPM_16_19, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

Visualise desire lines

We can then visualise this flow using the following code. Since the passenger flow is high, it is better to limit the visualisation. Here we will use a flow passenger flow from subzone to subzone of 1500 or more.

tm_shape(grid_weekdayAM) +
  tm_polygons() +
flow_weekdayAM_6_9 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
tm_shape(grid_weekdayPM) +
  tm_polygons() +
flow_weekdayPM_5_8 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
tm_shape(grid_weekendAM) +
  tm_polygons() +
flow_weekendAM_11_14 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
tm_shape(grid_weekendPM) +
  tm_polygons() +
flow_weekendPM_16_19 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)